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ABSTRACT 

Suid herpesvirus 1 (SuHV-1) is the causative agent of pseudorabies (PR), a disease of great importance due 
to the huge losses it causes in the swine industry. The aim of this study was to determine a method for 
genotyping SuHV-1 based on partial sequences of the gene coding for glycoprotein C (gC) and to elucidate 
the possible reasons for the variability of this region. A total of 109 gCsequences collected from GenBank 
were divided into five major groups after reconstruction of a phylogenetic tree by Bayesian inference. The 
analysis showed that a portion of gC (approximately 671 bp) is under selective pressure at various points 
that coincide with regions of protein disorder. It was also possible to divide SuHV-1 into five genotypes that 
evolved under different selective pressures. These genotypes are not specific to countries or continents, 
perhaps due to multiple introduction events related to the importation of swine. 
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INTRODUCTION 

Suid herpesvirus 1 (SuHV-1) is the causative agent of 
pseudorabies (PR), a disease with major importance in swine 
industry because it causes considerable losses in the production 
chain. PR is listed as one of the most important diseases 
affecting pigs in the Manual of Diagnostic Tests and Vaccines 
for Terrestrial Animals of the World Organization for Animal 
Health (34). SuHV-1 is a member of the Herpesviridae family, 
subfamily Alphaherpesvirinae, and has only one serotype. The 
natural host of this virus is the pig; however, it can infect other 
domestic animals such as dogs and cattle, causing encephalitis 
with almost 100% lethality (19). 

SuHV-1 is classified into different genotypes based on 



restriction endonuclease analysis (3), but more recent studies 
conducted in Brazil, the United States and Europe have 
characterized viral strains based on partial sequencing of UL44 
focusing on the region encoding the N-terminal portion of 
glycoprotein C (10,9, 20, 12). The UL44 gene is one of the 
most variable regions of the SuHV-1 genome (17). The gene 
encodes glycoprotein C (gC), which is the main component 
involved in adhesion to host cell receptors and is considered to 
be a potent inducer of the immune response. The protein 
contains eight N-glycosylation sites and three redundant 
heparin-binding domains (HBDs) (32). This glycoprotein is 
involved in two distinct steps of virus adhesion to host cells. 
The first step is an initial, low-affinity interaction between gC 
and cellular heparin-like receptors, followed by a second 
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interaction that results in a more stable binding of the virus to 
the cells (36). 

The use of molecular data such as nucleotide and amino 
acid sequences is an essential tool for understanding the 
variability and epidemiology of the virus. These tools were 
used to analyze outbreaks occurring in areas of high swine 
production in Brazil between 1983 and 2003 (9) and in the 
United States in 1989 (10). Another relevant analysis is the 
genetic profiling of the strains circulating in wild boars or pigs 
(10, 12) or the investigation of the spread of live vaccine 
strains among feral pigs (12). These surveys found a high 
degree of conservation between virus sequences and did not 
indicate a correlation between outbreak location and 
phylogenetic groups for SuHV-1. Another common feature in 
this field is that most research is restricted to the study of 
SuHV-1 genetic variation in specific territories and does not 
involve other methods of tree reconstruction besides 
neighborjoining. 

Other methods such as maximum likelihood (ML) and 
Bayesian inference (BI) have been used for phylogenetic 
analysis based on amino acids and nucleotide substitutions. 
These methods have been used in several studies, including 
molecular biology analyses of viruses from various countries 
such as genotyping of bovine leukemia virus, and have 
produced useful and interesting new data (26). These 
bioinformatic tools can also be used to study polymorphism 
and selective pressure in SuHV-1 nucleotide sequences. 

The aim of this study was to use new bioinformatic s tools 
such as ML and BI for analysis of partial sequences of the 
UL44 gene available in the GenBank. This analysis will lead to 
a better understanding of the relationship between SuHV-1 
isolates from different regions of the globe and the selective 
pressure and polymorphism found in each group. 

MATERIALS AND METHODS 

SuHV Sequences 

We obtained complete or partial nucleotide sequences of 



SuUWUL44 from GenBank. All UL44 sequences were 
analyzed in previous studies (10, 9, 20, 12), with the exception 
of sequences from China, South Korea and Malaysia. The 109 
sequences were named according to the name of the isolate in 
GenBank, followed by a three letter code to identify the 
country: Germany (GER), Brazil (BRA), China (CHI), North 
Korea (SKO), Slovakia (SLK), Spain (ESP), United States 
(USA), Japan (JAP), Hungary (HUN), Northern Ireland (NIR) 
and Malaysia (MAL). In addition, three sequences are present 
for the Bartha vaccine: one from a complete genome sequence, 
one from Brazil and another from China. 

In addition, we sequenced 12 SuHV-1 samples from 
outbreaks occurring between 2002 and 2003 in the state of 
Santa Catarina in Brazil in accordance with Goldberg et al. 
(10). 

SuHV-1 Phylogenetic reconstruction 

Nucleotide sequences were submitted to three programs 
for the reconstruction of phylogenetic trees. The MEGA 4.0 
program (18) was used to reconstruct a phylogeny by neighbor 
joining (using the maximum composite likelihood model) with 
1000 bootstrap replicates (31). The best model for the 
reconstruction of phylogenetic trees was selected using the 
jModelTest program. The chosen model was the TnR93 with 
gamma distribution and optimized frequencies of substitution 
(24). 

After choosing the model, reconstruction of the phylogeny 
by the method of maximum likelihood was performed using 
the Seaview program (11). The parameters generated by the 
jModelTest program were also included in the MrBayes 
program to create trees using the Bayesian method (16). 
Additionally, using MEGA 4.0, the mean distances between 
and within the groups formed in the phylogenetic trees were 
determined based on the number of differences. 

Evolutionary history of Brazilian SuHV-1 isolates 

In addition to the reconstruction of phylogenetic trees as 
described above, another type of analysis was performed. The 
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Brazilian samples were chosen to trace the history of the entry 
of SuHV-1 into the country. The Brazilian samples were 
selected because of the greater knowledge of our group 
regarding the isolates and the history of PR in Brazil compared 
to other nations. Brazilian sequences and sequences from other 
countries with dates of isolation from each group identified 
previously were submitted to the program Beast (5) with 
substitution models defined by jModelTest. Codons were 
grouped into three partitions, and the substitution model was 
unlinked across codon positions. Each analysis was run such 
that the effective sample size was greater than 200. Samples 
from group E were not included because we could not find any 
information about the date of isolation of the virus sequenced. 

Predictive analysis of translated sequences 

All analyses of the translated sequences were generated 
using strain NIA-3 as a reference (GenBank accession number 
D49437). Sequences for the Shope and Bartha strains were also 
included in this analysis. 

Sequences from the five groups were uploaded to the 
ELM Functional Sites in Proteins website (available at 
http://elm.eu.org/) for analysis and confirmation of 
glycosylation domains. Analysis of hydrophobicity and 
hydrophilicity was performed using the Kyte and Doolittle 
mean hydrophobicity method (seven-residue scan window 
without gaps) and the Hopp and Woodsmean hydrophilicity 
method (six-residue scan window without gaps) in the BioEdit 
program. The analysis focused on the signal peptide, formed by 
the first 22 amino acids, six glycosylation sites and three 
heparin-binding domains. 

The SIFT program (available at http://blocks.fhcrc.org/ 
sift/SIFT.html) (21) was used to predict the degree of tolerance 
of amino acid substitutions. Analysis of the influence of amino 
acid substitutions on the secondary structure and solvent 
accessibility was performed using the nnPredict (available at 
http://alexander.compbio.ucsf.edu/~nomi/nnpredict.html) and 
Scratch (available at http://www.ics.uci.edu/~baldig/scratch/ 
index.html) programs. The domain search was performed using 



the ProDom program (available at http://prodom.prabi.fr/ 
prodom/current/html/home .php) . 

Identification of specific positive and negative selection sites 

The SuHV-1 UL44 sequences were submitted to search 
for specific positive and negative selection sites performed by 
the Selecton program (30; available at http://selecton.bioinfo. 
tau.ac.il/). The chosen model was M8, the default model for all 
Selecton runs. This model allows for positive selection 
operating on the protein. The results were statistically verified 
with the null model M8a, which is similar to the M8 model, 
except for not allowing for positive selection. Thus, only 
neutral and purifying selections are allowed. This null model 
allows for testing the hypothesis by performing a likelihood 
ratio test between the two models to see which model fits the 
data better. 

All sequences were tested using a single file. Afterward, 
each group found in the phylogenetic tree was tested separated 
to verify its involvement under different selective pressures. 

RESULTS 

Phylogenetic tree reconstruction 

The results of the phylogenetic analysis showed several 
differences, especially among more distantly related samples 
(Figure 1). Some eastern samples were placed more distantly 
using NJ but were positioned next to a group of European 
isolates using ML and BI. Another difference in phylogeny was 
the peculiar position of IB341BRA, which was similar using 
NJ and BI but different using ML. The three sequences for the 
Bartha vaccine were positioned in different groups. BarthaBRA 
and BarthaPLOS were positioned in a group of European 
sequences, and BarthaCHI grouped with samples from China. 

Analyses using ML and BI formed five major groups 
supported by high bootstrap values and credibility (Fig. 1). 
These groups were not as clear using NJ. The groups were 
named A, B, C, D and E and were characterized primarily by a 
hotspot between residues 184 and 195 (Fig. 2). The sequences 
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originating from viruses isolated in Europe were distributed A, 3.7 for B, 3.6 for C, 2.7 for D and 7.6 for E. All of these 
among all of the groups. distances were smaller than the distances calculated between 

The distances calculated within each group were 0.1 for groups, as can be seen in Table 1. 




Figure 1. Phylogenetic treereconstructed 
using Bayesian inference. The results 
showthe distributionof the samplesinto five 
major groups supported by high probability 
values. The clusters were not distributed by 
geographical areas. However, such 
groupings show consistent results, and the 
discrepancies in the locations of the 
samples can be taken as evidence of 
different introductions of SuHV-1 into a 
country or region. 
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Figure 2. Alignment of gC sequences provides evidence of hotspots in the protein, using the standard strain Shope as a reference. The 
sequences represent the amino acid variation in each cluster. The figure only shows the amino acid substitutions that separate the SuHV- 1 
isolates in the five groups shown in this work. 
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Positive selection Purifying selection 
Figure 3. Results from the Selecton software for the N-terminal region of gC. The color scale indicates locations subject to 

positive, neutral or purifying selective pressure. The N-glycosylation sites are marked in solid lines, and the HBD sare indicated 

by dashed lines. These sites are highly conserved, with amino acids substitutions found only in group E. The hot spot located 

between residues 175 and 185 shows the highest degree of positive pressure. 



Table 1. Distances values between SuHV-1 groups 
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Evolutionary history of Brazilian SuHV-1 isolates 

The results of the analysis of the Brazilian samples 
allowed us to estimate the divergence time (in years) relative to 
other countries. The time of divergence was 264 years for 
groups C and D and 410 years for groups C and B. It was not 
possible to calculate a clear time of divergence between group 
A and any other group. 

The estimated distance between the most recent common 
ancestor of the isolates in the group with the highest prevalence 
in Brazil (D) and the most recent European sequence 
(614BWGER2008) was approximately one hundred years (Fig. 
2). The most recent common ancestor between the Brazilian 
isolates found in group B and NIA-3 (the oldest sequence in 



this group with a date of isolation described in the literature) 
dates from 90 years ago. 

Predictive analysis of translated sequences 

Translated gC sequences revealed that the glycosylation 
domains and heparin binding sites were highly conserved. 
ELM confirmed the glycosylation sites and the signal peptide. 
The signal peptide did not show any amino acid substitution in 
any sequence. In addition, no probable function was found for 
the hotspot (Figure 2) located between amino acids 180 and 
185, except for the presence of an additional disordered region 
for samples from group D. 

The hot spot was also associated with changes in the 
hydrophobicity profile of the region. Groups C and D displayed 
a remarkable reduction in hydrophobicity. The same region has 
a difference of 0.5 between amino acids 190 and 195 in the 
Hopp and Woods Scale Mean Hydrophilicity Profile, which 
indicates that it is more antigenically exposed. 

A domain search using ProDom revealed homology with 
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two glycoprotein domains from Marek's disease herpesvirus. 
The first homolog is a transmembrane precursor signal 
(accession number PD018038), and the second is predicted to 
be an MHC II-recognized epitope (accession number 
PD002483). The homologous area, recognized by MHC U, is 
smaller by approximately 20 residues for both group D 
samples. 

Analysis of the tolerance of amino acid substitutions using 
the SIFT program predicted that all substitutions in the 
majority of the gC sequences are tolerable, with the exception 
of some substitutions in Bartha and IB341/86. Residue 43 from 
the vaccine sample is predicted to be tolerable; nonetheless, it 
does not appear in any other sequence from the database. 
IB341/86, on the other hand, displays an alanine-to-valine 
substitution at residue 60, which is listed as intolerable. 
The secondary structure of gCis modified by the substitutions, 
especially at the hot spot and neighboring regions. The 
secondary structures of most A and B groups amples were 
characterized by the presence of a beta sheet at the hot spot. 
This strand wasshortened by substitutions in Shope and 
Barthaand was absent from the C and D groups. Adjacent 
structures were also affected, with a reduction in the alpha- 
helix located immediately before the hot spot in standard and 
vaccine samples and an increase in the alpha-helix in group B. 

Identification of specific positive and negative selection sites 

The analysis performed by the Selecton program found 
sites of selective pressure in gC (Figure 3). When all of the 
sequences were tested together, the program detected selective 
pressure, with a likelihood ratio test between the two models 
showing a significance level of 0.001. The results were 
different when each group was tested separately. Groups A and 
C had sites under neutral or purifying selective pressure. Group 
B had sites under positive pressure, but the results were not 
significant relative to the entire sequence. The results indicated 
that group D is under positive selective pressure with a 
significance level of 0.05. 



DISCUSSION 

Comparative analysis of genome sequences is a highly 
important tool in epidemiological studies of infectious agents. 
Applications of this analysis can provide data ranging from 
gene and protein function to phylogenetic relationships 
between microorganisms. One of the tools used in this field is 
the application of genome sequence comparison to determine 
the molecular epidemiology of microorganisms. Such studies 
have been performed both in human medicine, with viruses 
such as varicella-zoster virus (33) and dengue virus (29), and in 
veterinary medicine, with rabies virus (13), bovine leukemia 
virus (2) and bovine herpesvirus 1 and 5 (7). 

Recent publications on the molecular epidemiology of 
SuHV-1 used NJ for reconstruction of phylogenetic trees (10, 
9, 20, 12). NJ may not be the best method for this type of study 
because the tree reconstructed using NJ could not separate 
groups B and D, placing them in locations different from those 
observed using the BI and ML methods. These characteristics 
may be related to high similarity in the first case and the high 
degree of divergence in the second. The most likely 
explanations for the results using NJ are the difficulty in 
estimating reliable distances for highly divergent sequences 
and the loss of information (when the sequences are very 
similar) due to compression of the distance between sequences 
(14). 

BI and ML trees showed similar and more sub-divided 
patterns, including five distinct groups with high confidence 
values. BI was chosen here as a reference for the analysis due 
to the high values of reliability and versatility for the method. 
The same method has also been used to define genotypes in 
other virus species such as bovine leukemia virus (26) and 
yellow fever virus (4). 

The SuHV-1 clusters were not distributed by geographical 
areas. This result can be taken as evidence of multiple 
introductions of SuHV-1 into a country or region. European 
samples were more divergent than sequences from other 
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countries and were located in all groups, indicating a more 
ancient presence of the virus in that continent. These results 
support the theory that the SuHV-1 strains isolated in the last 
century originated in Europe and were distributed worldwide 
due to the commerce involving the breeds most commonly 
used in swine industry. These breeds were all developed in 
Europe and later used in other countries (1). 

The phylogenetic groups were not specific for 
geographical areas, but some sequences from the same country 
clustered together. Sequences from the United States, eastern 
Europe and Brazil were found in specific groups even when 
isolated at different times. One example is the Shope strain, 
isolated in 1942 in Hungary, which was located in group B 
with 576HUN and 563HUN, both isolated in 1996. These 
results show that even if the major phylogenetic groups are not 
specific to continents or countries, there are SuHV-1 strains 
specific to some regions. 

Among the phylogenetic groups, there is an unusual 
feature in group D. There are isolates from cows and dogs from 
the United States, Brazil, Japan, France and Germany in this 
cluster. Although isolates from these animals do not only group 
together in D, it is interesting to note the behavior of these 
sequences when constructing all of the trees and their emphasis 
in the discussions of the articles in which they were 
characterized (10, 12, 20). Among the unique characteristics 
found in these studies, several stand out. The 8044 North 
American isolate may be related to a vaccine sample (12). The 
French isolate 527 has genomic features of genotypes I and II, 
in accordance with the BamHl restriction enzyme profile of the 
entire genome (20, 28).These results are suggestive of 
recombination, which has also occurred with the Japanese 
sample Yamagata S81 (3). 

Goldberg et al. (10) suggest that the characteristics of 
samples positioned in group D in this study would be host 
adaptations. Research with other herpesviruses was unable to 
find specific nucleotide substitutions associated with 
transmission between species (22). The ability of SuHV-1 to 
infect dogs and cattle does not suggest such adjustments 



because these animals are terminal hosts, and the infection 
process is fast and almost always fatal. These characteristics, 
coupled with the slow evolution of the virus, indicate that 
group D consists of atypical samples that are not the result of 
host adaptation. 

Almost all Brazilian samples isolated between 1954 and 
2003, before the implementation of the eradication plans in the 
largest pig raising areas, were found in group D. These results 
are indicative of a founder effect, in which an atypical SuHV-1 
strain diverged from European strains approximately one 
hundred years and spread throughout Brazil. The first reports 
of AD in Brazil date from 1908 to 1912, in agreement with the 
results presented here. 

Differences in selective pressure found in this work can be 
explained by environmental changes such as different breeds of 
pigs and the epidemiology of the disease in different countries 
and periods from which the sequences are derived. The breeds 
currently used in intensive farming of pigs are genetically 
diverse and result from strong artificial selection pressure, even 
with substantial differences in the immune system (1). The low 
levels or absence of positive selective pressure in groups A, B 
and C may be related to the nature of the sequences that form 
these clusters, which were derived from wild boars. PR clinical 
signs are extremely rare in these species because the primary 
mode of transmission is venereal (27), and the strains isolated 
are usually of low virulence (20). 

Epidemiological characteristics can also be an influential 
factor. The rate of diversity and positive selective pressure tend 
to increase when there is horizontal transmission of the virus 
and a shorter generation time (6). Unlike the samples from 
groups A, B and C (from wildlife or countries with advanced 
eradication programs), the sequences that form groups D and E 
came from viruses isolated from domestic pigs in periods of 
high prevalence of the disease. The Brazilian samples came 
from a region of high production of isolated pigs and from a 
period with high rates of infection before eradication efforts 
(23). Chinese swine also suffered a series of outbreaks of PR, 
which implies a larger population and higher speed of 
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transmission (35). 

The variations found in gC ranged from amino acids 
considered intolerable by SIFT to changes in the hydrophilicity 
profile. No HBD was affected by amino acid substitutions. The 
signal peptide, which is important in glycoprotein localization 
during virion assembly, did not show any substitution in any 
sequence, not even in the Bartha sample, which was previously 
reported to carry mutations at codons 12 and 14 (25). 
Substitutions at N-glycosylation sites did not cause a loss or 
change in position, only resulting in weak positive pressure 
areas. The majority of these sites are located in strong negative 
pressure areas, increasing their preservation. 

The largest modification profile is indeed located at the 
hot spot. At the hotspot, there is a modification in the group B 
group profile resulting in the complete substitution of the 
region and an insertion (from VVVE in group A to ALDDD). 
This change modifies the hydrophilicity profile, which would 
cause the region to be more antigenically exposed (15), in 
addition to including a disordered region. The presence of 
disordered regions in gC may also contribute to genetic 
variation (8). 

The results presented in this work suggest that the use of 
Bayesian analysis for reconstructing phylogenetic trees is 
helpful for molecular analysis and epidemiological studies and 
allows for comparison with the results of previous studies. It 
was also possible to divide SuHV-1 isolates into five genotypes 
that evolved under different selective pressures. These 
genotypes are not specific to countries or continents, perhaps 
due to multiple introductions related to the importation of 
swine. 
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